Homogeneity of Cluster Ensembles

نویسنده

  • Brijnesh J. Jain
چکیده

The expectation and the mean of partitions generated by a cluster ensemble are not unique in general. This issue poses challenges in statistical inference and cluster stability. In this contribution, we state sufficient conditions for uniqueness of expectation and mean. The proposed conditions show that a unique mean is neither exceptional nor generic. To cope with this issue, we introduce homogeneity as a measure of how likely is a unique mean for a sample of partitions. We show that homogeneity is related to cluster stability. This result points to a possible conflict between cluster stability and diversity in consensus clustering. To assess homogeneity in a practical setting, we propose an efficient way to compute a lower bound of homogeneity. Empirical results using the k-means algorithm suggest that uniqueness of the mean partition is not exceptional for real-world data. Moreover, for samples of high homogeneity, uniqueness can be enforced by increasing the number of data points or by removing outlier partitions. In a broader context, this contribution can be placed as a further step towards a statistical theory of partitions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Coherent Transport of Single Photon in a Quantum Super-cavity with Mirrors Composed of Λ-Type Three-level Atomic Ensembles

In this paper, we study the coherent transport of single photon in a coupled resonator waveguide (CRW) where two threelevel Λ-type atomic ensembles are embedded in two separate cavities. We show that it is possible to control the photon transmission and reflection coefficients by using classical control fields. In particular, we find that the total photon transmission and reflection are achieva...

متن کامل

Bayesian Cluster Ensembles

Cluster ensembles provide a framework for combining multiple base clusterings of a dataset to generate a stable and robust consensus clustering. There are important variants of the basic cluster ensemble problem, notably including cluster ensembles with missing values, as well as row-distributed or column-distributed cluster ensembles. Existing cluster ensemble algorithms are applicable only to...

متن کامل

Cluster-Based Cumulative Ensembles

In this paper, we propose a cluster-based cumulative representation for cluster ensembles. Cluster labels are mapped to incrementally accumulated clusters, and a matching criterion based on maximum similarity is used. The ensemble method is investigated with bootstrap re-sampling, where the k-means algorithm is used to generate high granularity clusterings. For combining, group average hierarch...

متن کامل

Cluster Ensembles for High Dimensional Clustering: An Empirical Study

This paper studies cluster ensembles for high dimensional data clustering. We examine three different approaches to constructing cluster ensembles. To address high dimensionality, we focus on ensemble construction methods that build on two popular dimension reduction techniques, random projection and principal component analysis (PCA). We present evidence showing that ensembles generated by ran...

متن کامل

A CLUE for CLUster Ensembles

Cluster ensembles are collections of individual solutions to a given clustering problem which are useful or necessary to consider in a wide range of applications. The R package ̃clue provides an extensible computational environment for creating and analyzing cluster ensembles, with basic data structures for representing partitions and hierarchies, and facilities for computing on these, including...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1602.02543  شماره 

صفحات  -

تاریخ انتشار 2016